Overview

Dataset statistics

Number of variables11
Number of observations382154
Missing cells719119
Missing cells (%)17.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory32.1 MiB
Average record size in memory88.0 B

Variable types

NUM6
CAT3
BOOL2

Warnings

Tanggal_Asuransi has a high cardinality: 848 distinct values High cardinality
Gender has 31768 (8.3%) missing values Missing
Umur has 96258 (25.2%) missing values Missing
Izin_Mengemudi has 76647 (20.1%) missing values Missing
Kode_Wilayah has 84074 (22.0%) missing values Missing
Tanggal_Asuransi has 78084 (20.4%) missing values Missing
Tahun_Kendaraan has 66440 (17.4%) missing values Missing
Biaya has 126537 (33.1%) missing values Missing
Sourcing_Channel has 83645 (21.9%) missing values Missing
Hari_Diasuransikan has 75666 (19.8%) missing values Missing
id has unique values Unique

Reproduction

Analysis started2021-03-21 09:27:03.984535
Analysis finished2021-03-21 09:27:43.899546
Duration39.92 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

id
Real number (ℝ≥0)

UNIQUE

Distinct382154
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean234392.9535
Minimum1
Maximum508145
Zeros0
Zeros (%)0.0%
Memory size2.9 MiB
2021-03-21T16:27:44.313536image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile22939.65
Q1115006.25
median230461.5
Q3345434.75
95-th percentile471209.05
Maximum508145
Range508144
Interquartile range (IQR)230428.5

Descriptive statistics

Standard deviation139527.4873
Coefficient of variation (CV)0.5952716806
Kurtosis-1.058717575
Mean234392.9535
Median Absolute Deviation (MAD)115213
Skewness0.1322865442
Sum8.957420474e+10
Variance1.946791972e+10
MonotocityNot monotonic
2021-03-21T16:27:44.594536image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
40941< 0.1%
 
4325181< 0.1%
 
904891< 0.1%
 
966341< 0.1%
 
945871< 0.1%
 
843481< 0.1%
 
823011< 0.1%
 
884461< 0.1%
 
863991< 0.1%
 
4345611< 0.1%
 
Other values (382144)382144> 99.9%
 
ValueCountFrequency (%) 
11< 0.1%
 
31< 0.1%
 
41< 0.1%
 
61< 0.1%
 
91< 0.1%
 
ValueCountFrequency (%) 
5081451< 0.1%
 
5081441< 0.1%
 
5081431< 0.1%
 
5081411< 0.1%
 
5081401< 0.1%
 

Gender
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing31768
Missing (%)8.3%
Memory size2.9 MiB
Pria
192814 
Wanita
157572 
ValueCountFrequency (%) 
Pria19281450.5%
 
Wanita15757241.2%
 
(Missing)317688.3%
 
2021-03-21T16:27:44.816534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2021-03-21T16:27:44.953535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:45.113535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length6
Median length4
Mean length4.741523051
Min length3

Umur
Real number (ℝ≥0)

MISSING

Distinct66
Distinct (%)< 0.1%
Missing96258
Missing (%)25.2%
Infinite0
Infinite (%)0.0%
Mean38.91659205
Minimum20
Maximum85
Zeros0
Zeros (%)0.0%
Memory size2.9 MiB
2021-03-21T16:27:45.321571image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile21
Q124
median33
Q352
95-th percentile71
Maximum85
Range65
Interquartile range (IQR)28

Descriptive statistics

Standard deviation16.70679967
Coefficient of variation (CV)0.4292976026
Kurtosis-0.8350756175
Mean38.91659205
Median Absolute Deviation (MAD)11
Skewness0.6477650251
Sum11126098
Variance279.1171551
MonotocityNot monotonic
2021-03-21T16:27:45.526538image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
24230336.0%
 
23215605.6%
 
25186364.9%
 
22183674.8%
 
21139963.7%
 
2697762.6%
 
2777482.0%
 
2864491.7%
 
5063161.7%
 
5157891.5%
 
Other values (56)15422640.4%
 
(Missing)9625825.2%
 
ValueCountFrequency (%) 
2057351.5%
 
21139963.7%
 
22183674.8%
 
23215605.6%
 
24230336.0%
 
ValueCountFrequency (%) 
8510< 0.1%
 
8414< 0.1%
 
8324< 0.1%
 
8232< 0.1%
 
8152< 0.1%
 

Izin_Mengemudi
Boolean

MISSING

Distinct2
Distinct (%)< 0.1%
Missing76647
Missing (%)20.1%
Memory size2.9 MiB
1
305145 
0
 
362
(Missing)
76647 
ValueCountFrequency (%) 
130514579.8%
 
03620.1%
 
(Missing)7664720.1%
 
2021-03-21T16:27:45.701560image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kode_Wilayah
Real number (ℝ≥0)

MISSING

Distinct53
Distinct (%)< 0.1%
Missing84074
Missing (%)22.0%
Infinite0
Infinite (%)0.0%
Mean26.40603194
Minimum0
Maximum52
Zeros1426
Zeros (%)0.4%
Memory size2.9 MiB
2021-03-21T16:27:45.846535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile6
Q115
median28
Q335
95-th percentile47
Maximum52
Range52
Interquartile range (IQR)20

Descriptive statistics

Standard deviation13.16317898
Coefficient of variation (CV)0.4984913678
Kurtosis-0.8581445102
Mean26.40603194
Median Absolute Deviation (MAD)10
Skewness-0.1174582097
Sum7871110
Variance173.2692808
MonotocityNot monotonic
2021-03-21T16:27:46.067535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
288385721.9%
 
8264756.9%
 
46157184.1%
 
41148473.9%
 
15101642.7%
 
3099342.6%
 
2991102.4%
 
5079112.1%
 
1173021.9%
 
372711.9%
 
Other values (43)10549127.6%
 
(Missing)8407422.0%
 
ValueCountFrequency (%) 
014260.4%
 
17240.2%
 
228680.8%
 
372711.9%
 
413980.4%
 
ValueCountFrequency (%) 
522100.1%
 
51154< 0.1%
 
5079112.1%
 
4913420.4%
 
4833160.9%
 

Tanggal_Asuransi
Categorical

HIGH CARDINALITY
MISSING

Distinct848
Distinct (%)0.3%
Missing78084
Missing (%)20.4%
Memory size2.9 MiB
7/29/2019
 
468
10/4/2019
 
467
2/7/2020
 
454
2/18/2020
 
454
2/6/2020
 
453
Other values (843)
301774 
ValueCountFrequency (%) 
7/29/20194680.1%
 
10/4/20194670.1%
 
2/7/20204540.1%
 
2/18/20204540.1%
 
2/6/20204530.1%
 
2/23/20204520.1%
 
1/3/20204520.1%
 
8/1/20194480.1%
 
8/23/20194470.1%
 
9/6/20194470.1%
 
Other values (838)29952878.4%
 
(Missing)7808420.4%
 
2021-03-21T16:27:46.313534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2021-03-21T16:27:46.528547image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length10
Median length9
Mean length7.716483407
Min length3

Tahun_Kendaraan
Categorical

MISSING

Distinct3
Distinct (%)< 0.1%
Missing66440
Missing (%)17.4%
Memory size2.9 MiB
1-2 Tahun
150132 
<1 Tahun
149957 
>2 Tahun
15625 
ValueCountFrequency (%) 
1-2 Tahun15013239.3%
 
<1 Tahun14995739.2%
 
>2 Tahun156254.1%
 
(Missing)6644017.4%
 
2021-03-21T16:27:46.720534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2021-03-21T16:27:46.859572image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:47.017535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length9
Median length8
Mean length7.523574266
Min length3

Biaya
Real number (ℝ≥0)

MISSING

Distinct46453
Distinct (%)18.2%
Missing126537
Missing (%)33.1%
Infinite0
Infinite (%)0.0%
Mean31183.75678
Minimum2630
Maximum540165
Zeros0
Zeros (%)0.0%
Memory size2.9 MiB
2021-03-21T16:27:47.213573image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2630
5-th percentile2630
Q124426
median31887
Q340007
95-th percentile59165.2
Maximum540165
Range537535
Interquartile range (IQR)15581

Descriptive statistics

Standard deviation18392.30559
Coefficient of variation (CV)0.5898040354
Kurtosis36.42634577
Mean31183.75678
Median Absolute Deviation (MAD)7791
Skewness2.148623328
Sum7971098357
Variance338276904.8
MonotocityNot monotonic
2021-03-21T16:27:47.410574image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
26304408811.5%
 
69856133< 0.1%
 
4517938< 0.1%
 
3845235< 0.1%
 
7072033< 0.1%
 
7254429< 0.1%
 
3828728< 0.1%
 
3110225< 0.1%
 
3608624< 0.1%
 
3488524< 0.1%
 
Other values (46443)21116055.3%
 
(Missing)12653733.1%
 
ValueCountFrequency (%) 
26304408811.5%
 
64661< 0.1%
 
98161< 0.1%
 
100041< 0.1%
 
101481< 0.1%
 
ValueCountFrequency (%) 
5401654< 0.1%
 
5080731< 0.1%
 
4951061< 0.1%
 
4720424< 0.1%
 
4481561< 0.1%
 

Sourcing_Channel
Real number (ℝ≥0)

MISSING

Distinct153
Distinct (%)0.1%
Missing83645
Missing (%)21.9%
Infinite0
Infinite (%)0.0%
Mean110.8720072
Minimum1
Maximum163
Zeros0
Zeros (%)0.0%
Memory size2.9 MiB
2021-03-21T16:27:47.611536image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile26
Q126
median152
Q3152
95-th percentile160
Maximum163
Range162
Interquartile range (IQR)126

Descriptive statistics

Standard deviation57.8626207
Coefficient of variation (CV)0.5218866526
Kurtosis-1.282168011
Mean110.8720072
Median Absolute Deviation (MAD)8
Skewness-0.7751767326
Sum33096292
Variance3348.082875
MonotocityNot monotonic
2021-03-21T16:27:47.808536image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
15212026031.5%
 
267081318.5%
 
124262386.9%
 
160210455.5%
 
156101062.6%
 
15767391.8%
 
15458831.5%
 
12249521.3%
 
15132790.9%
 
16329720.8%
 
Other values (143)262226.9%
 
(Missing)8364521.9%
 
ValueCountFrequency (%) 
19480.2%
 
24< 0.1%
 
35050.1%
 
45150.1%
 
64< 0.1%
 
ValueCountFrequency (%) 
16329720.8%
 
160210455.5%
 
15952< 0.1%
 
1584920.1%
 
15767391.8%
 

Hari_Diasuransikan
Real number (ℝ≥0)

MISSING

Distinct290
Distinct (%)0.1%
Missing75666
Missing (%)19.8%
Infinite0
Infinite (%)0.0%
Mean154.1689952
Minimum10
Maximum299
Zeros0
Zeros (%)0.0%
Memory size2.9 MiB
2021-03-21T16:27:48.037570image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum10
5-th percentile24
Q181
median154
Q3227
95-th percentile285
Maximum299
Range289
Interquartile range (IQR)146

Descriptive statistics

Standard deviation83.72084959
Coefficient of variation (CV)0.5430459574
Kurtosis-1.201810651
Mean154.1689952
Median Absolute Deviation (MAD)73
Skewness0.004187327762
Sum47250947
Variance7009.180657
MonotocityNot monotonic
2021-03-21T16:27:48.238535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
25611670.3%
 
5411460.3%
 
7311450.3%
 
5611320.3%
 
6311310.3%
 
3111310.3%
 
2211270.3%
 
16011260.3%
 
3711240.3%
 
1311230.3%
 
Other values (280)29513677.2%
 
(Missing)7566619.8%
 
ValueCountFrequency (%) 
1011040.3%
 
1110840.3%
 
1210080.3%
 
1311230.3%
 
149950.3%
 
ValueCountFrequency (%) 
29910260.3%
 
29810820.3%
 
29710200.3%
 
29610710.3%
 
29510410.3%
 

Target
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
0
319553 
1
62601 
ValueCountFrequency (%) 
031955383.6%
 
16260116.4%
 
2021-03-21T16:27:48.544589image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Interactions

2021-03-21T16:27:28.710536image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:29.052563image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:29.271534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:29.481565image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:29.697535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:29.932535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:30.159539image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:30.380566image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:30.606534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:30.882537image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:31.273534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:31.544534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:31.822536image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:32.075535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:32.324534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:32.548537image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:32.782535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:33.017565image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:33.240535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:33.455534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:33.671534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:33.889535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:34.104566image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:34.354563image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:34.588535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:34.848535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:35.252536image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:35.509535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:35.768534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:36.059535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:36.391535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:36.664535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:36.990536image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:37.282535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:37.598536image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:37.840534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-03-21T16:27:48.652571image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-03-21T16:27:48.955535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-03-21T16:27:49.223570image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-03-21T16:27:49.495569image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-03-21T16:27:49.755536image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-03-21T16:27:38.584564image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:39.623536image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:42.501538image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-21T16:27:43.301536image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

idGenderUmurIzin_MengemudiKode_WilayahTanggal_AsuransiTahun_KendaraanBiayaSourcing_ChannelHari_DiasuransikanTarget
058609Pria65.01.048.011/4/2018NaN2630.015.0131.00
1208222Wanita22.01.021.02/2/2018<1 TahunNaNNaNNaN0
2345428Wanita24.01.0NaN5/12/2019<1 TahunNaNNaN181.00
3236831Pria58.01.046.0NaN1-2 TahunNaN124.0NaN0
4280181PriaNaN1.036.011/19/2019>2 TahunNaNNaNNaN1
531680Pria55.01.028.012/10/20191-2 Tahun54135.052.0285.00
652488NaN23.01.050.0NaN<1 TahunNaNNaN145.00
7278334Pria72.01.028.08/26/2019>2 TahunNaN122.0242.00
8129322Pria23.0NaNNaN8/16/2019<1 Tahun33007.0124.0NaN0
9316145PriaNaN1.0NaNNaN1-2 Tahun53322.0NaNNaN0

Last rows

idGenderUmurIzin_MengemudiKode_WilayahTanggal_AsuransiTahun_KendaraanBiayaSourcing_ChannelHari_DiasuransikanTarget
382144478163Pria55.01.0NaN3/30/2019NaNNaNNaN152.00
382145253277Pria49.01.028.08/11/20181-2 TahunNaN30.084.00
38214693297Pria51.01.033.06/17/2018NaNNaNNaNNaN0
382147335745Pria21.01.047.0NaN<1 TahunNaN160.0NaN0
382148178224WanitaNaN1.028.012/13/2019NaN22158.0NaN40.00
382149255964Pria52.01.028.0NaN>2 TahunNaNNaN217.01
382150102144Pria23.01.0NaN8/27/2018<1 Tahun29282.0152.0260.00
382151480784PriaNaN1.03.09/12/2019NaN29217.0NaNNaN1
382152321214NaN51.01.0NaN8/8/2019NaN42063.026.0148.00
382153372274Pria57.01.0NaN10/24/20191-2 TahunNaN26.0215.00